Chapter 6
Taking All Kinds of Samples
IN THIS CHAPTER
Grasping the concept of statistical error
Setting up your sampling frame
Executing a sampling strategy
Sampling — or taking a sample — is an important concept in statistics. As described in Chapter 3, the
purpose of taking a sample — or a group of individuals from a population — and measuring just the
sample is so that you do not have to conduct a census and measure the whole population. Instead, you
can measure just the sample and use statistical approaches to make inferences about the whole, which
is called inferential statistics. You can estimate a measurement of the entire population, which is
called a parameter, by calculating a statistic from your sample.
Some samples do a better job than others at representing the population from which they are drawn.
We begin this chapter by digging more deeply into some important concepts related to sampling. We
then describe specific sampling approaches and discuss their pros and cons.
Making Forgivable (and Non-Forgivable) Errors
A central concept in statistics is that of error. In statistics, the term error sometimes means what you
think it means — that a mistake has been made. In those cases, the statistician should take steps to
avoid the error. But other times in statistics, the term error refers to a phenomenon that is unavoidable,
and as statisticians, we just have to cope with it.
For example, imagine that you had a list of all the patients of a particular clinic and their current ages.
Suppose that you calculated the average age of the patients on your list, and your answer was 43.7
years. That would be a population parameter. Now, let’s say you took a random sample of 20 patients
from that list and calculated the mean age of the sample, which would be a sample statistic. Do you
think you would get exactly 43.7 years? Although it is certainly possible, in all likelihood, the mean of
your sample — the statistic — would be a different number than the mean of your population — the
parameter. The fact that most of the time a sample statistic is not equal to the population parameter is
called sampling error. Sampling error is unavoidable, and as statisticians, we are forced to accept it.
Now, to describe the other type of error, let’s add some drama. Suppose that when you went to take a
sample of those 20 patients, you spilled coffee on the list so you could not read some of the names. The
names blotted out by the coffee were therefore ineligible to be selected for your sample. This is unfair
to the names under the coffee stain — they have a zero probability of being selected for your sample,
even though they are part of the population from which you are sampling. This is called
undercoverage, and is considered a type of non-sampling error. Non-sampling error is essentially a
mistake. It is where something goes wrong during sampling that you should try to avoid. And unlike